fix(gateway): aggregate component /logs and /bulk-data from hosted apps' fqns#394
Open
fix(gateway): aggregate component /logs and /bulk-data from hosted apps' fqns#394
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
Fixes empty results for synthetic/runtime-discovered components on /logs and /bulk-data/rosbags by aggregating from hosted apps’ effective FQNs (matching existing AREA/FUNCTION aggregation behavior) and updates docs/tests accordingly.
Changes:
- Update
LogHandlers::handle_get_logsto aggregate component logs from hosted apps and emit component-level aggregation metadata. - Update
BulkDataHandlers::get_source_filtersto aggregate component bulk-data sources from hosted apps with manifest fallback behavior. - Add integration tests for component aggregation behavior and update REST API documentation for component logs aggregation.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| src/ros2_medkit_gateway/src/http/handlers/log_handlers.cpp | Aggregate component logs from hosted apps’ FQNs + add x-medkit aggregation metadata. |
| src/ros2_medkit_gateway/src/http/handlers/bulkdata_handlers.cpp | Extend hosted-app aggregation to component source filters with fallback behavior. |
| src/ros2_medkit_integration_tests/test/features/test_logging_api.test.py | Add integration test validating non-empty aggregated component logs + metadata. |
| src/ros2_medkit_integration_tests/test/features/test_bulk_data_api.test.py | Add integration test validating aggregated component bulk-data descriptors. |
| docs/api/rest.rst | Document component logs aggregation and fallback behavior. |
Synthetic / runtime-discovered components have an empty fqn (no manifest
namespace), so the existing GET /components/{id}/logs path - which fed
entity.fqn to LogManager::get_logs with prefix_match=true - silently
returned zero items even when 20+ active nodes were mapped to that
component. Manifest-defined components with a real fqn worked because
their fqn matched the per-node logger name buffers.
Mirror the AREA / FUNCTION semantics already in this handler: look up
hosted apps via cache.get_apps_for_component(), build their effective
fqns, and call get_logs(host_fqns, prefix_match=false). Fall through to
the original namespace prefix path only when the component has no
hosted apps but still has a non-empty fqn (manifest deployments where
the component groups topics rather than nodes).
Response now also carries x-medkit.aggregation_level=component plus
aggregation_sources / app_count, parity with how AREA and FUNCTION
already advertise their aggregation. Adds an integration test verifying
component logs aggregate child app entries (verifies REQ_INTEROP_061).
…fqns
The same synthetic-component bug already fixed for /components/{id}/logs
also applies to BulkDataHandlers::get_source_filters. Synthetic /
runtime-discovered components have empty fqn AND empty namespace_path,
so the FUNCTION-only aggregation arm was not enough: components silently
returned zero source filters, producing empty rosbag descriptor lists
and failing ownership checks on download.
Extend the aggregation arm to handle COMPONENT identically - resolve
hosted apps via cache.get_apps_for_component() and build their
effective_fqn() list. Manifest deployments where a component groups
topics fall through to the existing namespace-prefix path.
Strengthen the integration test for the original logs fix: assert items
is non-empty (the bug returned 0 items with valid x-medkit metadata, so
app_count alone is insufficient) and match aggregation_sources by FQN
substring instead of bare app id. Add an integration test for component
bulk-data aggregation, and update docs/api/rest.rst to describe the
actual aggregation behavior for /components/{id}/logs.
…olve_app_host_fqns
Address review feedback on the component log/bulk-data aggregation fix:
- Extract the per-app FQN resolution loop (cache lookup + effective_fqn +
empty-skip) to HandlerContext::resolve_app_host_fqns and call it from
log_handlers (FUNCTION/AREA/COMPONENT branches) and bulkdata_handlers
(FUNCTION/COMPONENT). Removes three duplicate copies of the same loop
and gives both handlers a single unit-tested helper.
- Split FUNCTION and COMPONENT branches in BulkDataHandlers::get_source_filters.
The previous combined arm fell through to entity.fqn / namespace_path for
both types when host_fqns was empty, which silently changed FUNCTION
semantics: functions are pure aggregated views and must return an empty
filter list when no apps host them. COMPONENT keeps the fall-through for
manifest-only deployments.
- Make BulkDataHandlers::get_source_filters public (was private). The
branching logic is non-trivial and needs unit coverage; calling it
through the REST surface forces a real FaultManager service which is
not viable in unit tests.
- Add unit tests:
* 7 ResolveAppHostFqnsTest cases pinning the silent-skip semantics
(missing apps, empty effective_fqn, bound_fqn precedence).
* 9 BulkDataSourceFiltersTest cases covering APP / AREA / FUNCTION /
COMPONENT branching, including the regression guard that FUNCTION
without hosts returns empty even if the entity carried a fqn.
* 3 ComponentLogs* cases on the existing AreaAggregationTest fixture
that exercise the COMPONENT branch in handle_get_logs end-to-end via
the REST server: synthetic component aggregates from hosted apps,
manifest-only component falls through, empty component returns empty
items without crashing.
- Set ROS_DOMAIN_ID for test_bulkdata_handlers (now stands up a real
GatewayNode, needs domain isolation per CMakeLists.txt convention).
- Strengthen the integration test for component logs aggregation:
* Wrap the assertion in poll_endpoint_until - the /rosout buffer fills
asynchronously after node startup, the prior immediate assertion was
timing-dependent.
* Match aggregation_sources by FQN substring instead of bare app id
(the response carries full FQNs like /powertrain/engine/temp_sensor).
- Clarify docs/api/rest.rst: app_count and aggregation_sources are
populated only when hosted-app aggregation is active and are omitted
under the namespace-prefix fallback.
Contributor
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 11 out of 11 changed files in this pull request and generated 2 comments.
Comments suppressed due to low confidence (1)
src/ros2_medkit_gateway/test/test_handler_context.cpp:57
jsonis used throughout this test file (e.g.,json::parse(...)) but theusing json = nlohmann::json;alias was removed and there is no replacement alias/namespace import. This will not compile. Re-introduce the alias (or fully-qualifynlohmann::json) in this file.
using namespace ros2_medkit_gateway;
using namespace ros2_medkit_gateway::handlers;
// =============================================================================
// HandlerContext static method tests (don't require GatewayNode)
// =============================================================================
TEST(HandlerContextStaticTest, SendErrorSetsStatusAndBody) {
httplib::Response res;
HandlerContext::send_error(res, 400, ERR_INVALID_REQUEST, "Test error message");
EXPECT_EQ(res.status, 400);
EXPECT_EQ(res.get_header_value("Content-Type"), "application/json");
auto body = json::parse(res.body);
EXPECT_EQ(body["error_code"], ERR_INVALID_REQUEST);
EXPECT_EQ(body["message"], "Test error message");
7f8884f to
778dd0f
Compare
Address two review findings: - Move ``get_source_filters`` back to ``private``. Extract the pure entity-type branching to ``handlers::detail::compute_bulkdata_source_filters``, a free function that takes a ``ThreadSafeEntityCache`` reference plus an ``EntityInfo``. The instance method becomes a one-line wrapper that fetches the cache from ``ctx_`` and delegates. This keeps the handler's public API unchanged while letting the unit tests target the helper directly without spinning up a ``GatewayNode`` (the previous fixture stood up a real ``GatewayNode`` purely to obtain a cache reference). - Add ``#include <iterator>`` to ``log_handlers.cpp`` for ``std::make_move_iterator`` (was relying on transitive inclusion). Test rewrite: ``BulkDataSourceFiltersTest`` now constructs a bare ``ThreadSafeEntityCache``, calls ``cache_.update_all(...)``, and invokes ``handlers::detail::compute_bulkdata_source_filters`` directly. No DDS context, no rclcpp init, no ``ros2_medkit_test_utils`` shared library required. Drops the ``medkit_set_test_domain`` line for the same reason - this fixture no longer creates a ROS node, so domain isolation is moot. All 9 entity-type cases (synthetic component, manifest fallback, function without hosts regression guard, etc.) still cover the same scenarios.
…-error The rosdep keys for ``ament_cmake_clang_tidy`` and ``ament_cmake_clang_format`` are not registered for noble (the Rolling runner image), so ``rosdep install`` hard-fails before the build can start. Linters / clang-tidy already run only in the Jazzy job in ``quality.yml`` (gated by ``ENABLE_CLANG_TIDY`` in ``ROS2MedkitLinting.cmake``), so the build-and-test matrix on Humble + Rolling does not need those CMake packages installed - skip the keys instead of trying to resolve them. ``opcua-plugin.yml`` already follows the same pattern. Drop ``continue-on-error`` for the Rolling job. Rolling is now treated as a required-green distro per CLAUDE.md - any Rolling-specific failure is a real breakage that needs a code-side fix or an explicit owner sign-off, not a silent soft-fail.
ci.yml skips the ``ament_cmake_clang_format`` rosdep key on Humble + Rolling (linters only run in the Jazzy quality job, the rosdep key isn't even registered for noble), so ``find_package(... REQUIRED)`` hard-fails the configure step on those runners. Switch all 8 affected packages to ``find_package(ament_cmake_clang_format QUIET)`` plus an ``if(_FOUND)`` guard around the corresponding ``ament_clang_format(...)`` call. On Jazzy (where the package is installed in the quality job's rosdep step) the behavior is unchanged - clang-format still runs and passes 310 tests on ``ros2_medkit_gateway``. On Humble + Rolling the configure step now proceeds without the package and the linter is silently skipped, which matches the project policy that linters only run on Jazzy. The companion ``ament_cmake_clang_tidy`` find_package in ``ROS2MedkitLinting.cmake`` is already gated by the ``ENABLE_CLANG_TIDY`` option (default OFF; only set ON in the Jazzy clang-tidy job in quality.yml), so no change needed there.
…test job The opcua-plugin Unit tests matrix runs on humble + jazzy + rolling. On noble (Rolling) the ament_cmake_clang_format / ament_cmake_clang_tidy rosdep keys are not registered, so the previous skip-keys list (which covered only nav2_msgs) caused rosdep install to fail before any test could run. Add the linter keys to skip-keys, mirroring the change just made in ci.yml. The upstream medkit packages have already been switched to QUIET + FOUND for those find_package calls so the build degrades gracefully when the linter packages aren't installed.
The first RUN clears ``/var/lib/apt/lists/*`` to keep the builder image small. The next RUN then calls ``rosdep install``, which in turn invokes ``apt-get install`` under the hood without first refreshing the apt cache. On cold Docker builds this fails with ``E: Unable to locate package python3-jsonschema`` (declared as a ``<test_depend>`` by ``ros2_medkit_integration_tests`` and pulled in by rosdep even with ``BUILD_TESTING=OFF``). The OPC-UA plugin Integration jobs were masking this on main via warm BuildKit layer caches and only blew up after the cache was evicted. Add an explicit ``apt-get update`` at the start of the rosdep RUN, and move the ``rm -rf /var/lib/apt/lists/*`` cleanup to the end of the same layer so the runtime image still inherits no apt cache.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fix two handlers that silently returned empty results for synthetic / runtime-discovered components, by mirroring the AREA / FUNCTION aggregation pattern already present in the same files.
LogHandlers::handle_get_logs- the COMPONENT branch fedentity.fqntoLogManager::get_logs(..., prefix_match=true). Synthetic components have an emptyfqn, so the prefix match selected nothing and the response was an emptyitemslist even when the component grouped N active apps. Resolve hosted apps viacache.get_apps_for_component()and callget_logs(host_fqns, prefix_match=false). Manifest deployments with a non-emptyfqnand no hosted apps fall through to the original namespace-prefix path. Response now also carriesx-medkit.aggregation_level=component,app_count, andaggregation_sourcesfor parity with AREA and FUNCTION.BulkDataHandlers::get_source_filters- identical pattern. The handler had a FUNCTION-only aggregation arm and fell through toentity.fqn/entity.namespace_pathfor COMPONENT. Both empty for synthetic components, so source filters were{}, descriptor lists came back empty, and ownership checks inhandle_downloadshort-circuited. Extended the aggregation arm to handle COMPONENT identically; manifest-only components still fall through to the namespace prefix path.docs/api/rest.rstupdated to describe the actual aggregation behavior for/components/{id}/logsinstead of the stalenamespace prefix matchwording.Tests
test_component_get_logs_aggregates_child_appsintest_logging_api.test.pyverifies a synthetic component returns non-emptyitemsplus the new aggregation metadata. Assertslen(items) > 0directly because the original bug returned 0 items with otherwise validx-medkitmetadata, so checkingapp_countalone is not sufficient. Matchesaggregation_sourcesby FQN substring instead of bare app id.test_bulk_data_component_aggregates_child_appsintest_bulk_data_api.test.pyverifies the equivalent fix for the bulk-data endpoint.SOVD API impact
x-medkitextensions on/components/{id}/logsonly - new optionalaggregation_level,aggregated,app_count,aggregation_sourcesfields, emitted only when local aggregation is active. Existing clients that ignore unknownx-medkit.*fields are unaffected./components/{id}/bulk-data/rosbagspayload schema is unchanged.Issue
A related but distinct symptom on
handle_get_faultfor synthetic components is a scope-leak that needs a different fix (changing the GetFault service contract), tracked separately - see comment on #380.Type
x-medkitaggregation metadata on/components/{id}/logs)docs/api/rest.rstlog endpoints description)Testing
All local runs green:
./scripts/test.sh unit --packages-select ros2_medkit_gateway- 2099 assertions across 81 tests, 0 failures../scripts/test.sh test_logging_api --packages-select ros2_medkit_integration_tests- 22 logging tests + shutdown, 0 failures. Includes the newtest_component_get_logs_aggregates_child_apps../scripts/test.sh test_bulk_data_api --packages-select ros2_medkit_integration_tests- 10 tests, 0 failures. Includes the newtest_bulk_data_component_aggregates_child_apps../scripts/test.sh lint --packages-select ros2_medkit_gateway ros2_medkit_integration_tests- clean (clang-format, cmake-lint, ament-copyright, flake8, pep257, cppcheck, xmllint).Checklist
x-medkitextensions on/components/{id}/logs)